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WATERMARKING OF DIGITAL IMAGE DATA 

Priority is claimed based on U.S. Provisional 
Application No. 60/063,509, filed October 27, 1997. 

Technica l Field 

This invention relates to providing digital image 
5 data with a watermark, and, more particularly, where the 
image data are video data. 

Background of the Invention 

A conventional watermark, on a paper document, may 
consist of a translucent design which is visible when the 

10 document is held to the light. Or, more generally, a 

watermark may be viewed under certain lighting conditions 
or at certain viewing angles. Such watermarks, which are 
difficult to forge, can be included for the sake of 
authentication of documents such as bank notes, checks 

15 and stock certificates, for example. 

In digital video technology, watermarks are being 
used to betoken certain proprietary rights such as a 
copyright, for example. Here, the watermark is a visible 
or invisible pattern which is superposed on an image, and 

2 0 which is not readily removable without leaving evidence 

of tampering^ Resistance to tampering is called 
''robustness" . 

One robust way of including a visible watermark in a 
digitized image is described by Braudaway et al . , 
25 "Protecting Publically Available Images with a Visible 
Image Watermark", IBM Research Division, T, J. Watson 
Research Center, Technical Report 96A000248. A luminance 
level, AL, is selected for the strength of the watermark, 
and the luminance of each individual pixel of the image 

3 0 is modified by AL and a nonlinear function. For 

increased security, the level AL is randomized over all 
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the pixels in the image. 

Summary of the Invention 

When images are transmitted as transformed by 
discrete cosine transformation (DCT) for compression, 
with or without motion compensation, it is advantageous 
to include a watermark after transf onnation . To this 
end, (i) a DCT watermark is generated for optimal 
visibility based on the original image data, and (ii) the 
generated watermark is superposed on the transformed 
data. 

Brief De scription of the Drawing 

Fig. 1 is an illustration for motion-compensated 
discrete cosine transformation (MC-DCT) . 

Fig. 2a is a watermark mask. 

Fig. 2b is an original image. 

Fig. 2c is a superposition of the original image and 
the watermark mask. 

Fig. 3 is a flow diagram of initial processing. 

Fig. 4 is a flow diagram of watermark superposition 
processing. 

Fig. 5 is a flow diagram of scaling for a region. 

Detailed Descrription 

A Mask Generation Module generates a DCT watermark 
mask based on the original video content. A Motion 
Compensation Module efficiently inserts the watermark in 
the DCT domain and outputs a valid video bit stream at 
specified bitrate. The following description applies 
specifically to image data in MPEG format. 

MPEG video consists of groups of pictures (GOP) as 
described in document ISO/IEC 13818 - 2 Committee Draft 
(MPEG-2) . Each GOP starts with an intra coded "I-frame", 
followed by a number of forward- referencing "P- frames" 
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and bidirectionally-ref erencing "B-frames". 

With motion compensation, when a watermark is 
inserted in an I-frame, the P- and B- frames in the GOP 
will be changed also. For such correction, the motion 
5 compensation on the watermark in an anchor or base frame 
must be subtracted when the watermark is added to a 
current frame. For such subtraction, the technique of 
motion compensation in the DCT domain can be used as 
described by S. F . Chang et al . , "Manipulation and 

10 Compositing of MC-DCT Compressed Video", IEEE Journal of 
Selected Areas in Communications, Special Issue on 
Intelligent Signal Processing, pp. 1-11, January 1995. 

In a video sequence, the image content changes from 
frame to frame. Thus, to keep a watermark sufficiently 

15 visible throughout the video, the watermark must be - 
adapted to the video contents. For example, when an 
image is complicated or "busy", i.e., when it has many 
high-frequency components, the watermark should be 
stronger. For different regions in the same video frame, 

2 0 the watermark should be scaled regionally— thereby 

enhancing the security against tampering. 

(i) Mask Generation Module 

In this module, as illustrated by Section (i) of 
Fig. 4, a watermark mask image is first generated for 
25 each GOP, or for the first P- frame after a scene cut. 

This is based on the fact that video content tends to be 
consistent within a GOP which is usually about 15 frames 
or 0.5 second long. But, when there is a scene cut 
within a GOP, visual content will change significantly, 

3 0 and a new mask is used to adapt to the new visual 

content. Thus, the watermark mask is superposed on the 
I-frame, or on the first P- frame after a scene cut. 

To generate the mask, as illustrated by Fig. 3, the 
input watermark image is first converted to a gray scale 
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image. Only the luminance channel of each image is 
modified. A transparent color (background color) is 
chosen. The luminance of all watermark pixels having the 
transparent color value is set to 0. Optionally, the 
mask image is randomly shifted in both x- and y- 
direction. A DCT is applied to obtain the DCT mask of 
the watermark. The luminance of the mask will be scaled 
adapt ively according to the input image content before 
adding to the input image. 

In the. pixel domain, the following formulae have 
been proposed in the above -referenced report by G. W. 
Braudaway et al . : 

y«/38,667 • (yUy^)''^^ ' for Yr^/y^ > 0.008856, 
Wnm' = w^- y,/903.3 • AL for y^/y, ^ 0.008856 (l) 

where is the scaled watermark mask that will be added 
to the original image, w^^ is the non- transparent 
watermark pixel value at pixel (ri,m), y„ is the scene 
white, YDm is the luminance value of the input image at 
image coordinates (n,m) , and AL is the scaling factor 
which controls the watermark strength. 

In accordance with an aspect of the present 
invention, for scaling in the DCT domain, a stochastic 
approximation can be used. If y^n^ and are considered 
as independent random variables, if y is normalized to 
the luminance range used in MPEG, namely from [0, 255] to 
[16, 235], and if y„ = 235, then, based on Equations 1, 
the expected values of w' are 

flw-J = 0.1607 E[w] E\y^] M , y> J7.9319 

(2) 

n 016(K £M AL . 17.9319 

Assuming that y has a normal distribution with mean 
a and variance P^, the E[y2^^]-term in Equation (2) can be 
represented as 
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23S <'-■>• 
17.9319 V2«P 

Thus, E[y2/3] is a function of the mean and the variance 
of the pixel values. 

Equation (2) specifies a relationship between the 
5 moments of random variables w, w' and y. This 

relationship can be extended to the deterministic case to 
simplify Equation (2) , resulting in a linear 
approximation. 

For each 8 by 8 image block, the mean and variance 
10 of the block are used to approximate a and (3^ in 

Equation 3, and the mean a is used to approximate y in 
deciding which of the formulae to use in Equation 2. 

n^y^ - 0.1607 W^^ /(€UP^ AL, a>n.Wl9 , 



« 02602 w^^ AL. aS17.9319 



(4) 



where, for k = 0, 63, Wi^^ is the k-th pixel of the 

15 i,j-th 8 by 8 block in the watermark image. w'ij„ is for ^ 
the scaled watermark. 

Equation 4 approximates the nonlinear function 
according to Equation 2, by linear functions block by 
block. The scaled watermark strength depends on the mean" 
20 and variance of the image block. For each image block, 
the higher the mean (i.e. the brighter), and the higher 
the variance * (i .e, the more cluttered), the greater the 
required strength of the watermark for maintaining 
consistent visibility of the watermark. 
25 The DCT of Equation 4 can be used to obtain the DCT 

of the watermark mask, which can be inserted in the image 
in the DCT domain. The mean and variance of the input 
image can be derived from the DCT coefficients, 

o - (y^8) and 

(5) 

63 O 

2 ^ « 2 Z *f ,vy \ 
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where Y^c and Y^c are DC- and AC-DCT coefficients, 
respectively, of the image block Y. 

A new watermark mask is calculated for each I -frame 
and P-frame, the latter in case of a scene cut. For I- 
5 frames, all DCT coefficients are readily accessible after 
minimal decoding of the MPEG sequence, i.e. inverse 
variable length coding, inverse run length coding and 
inverse quantization. For P-frames, since most blocks are 
in the scene cut, these DCT coefficient can be used 
10 immediately. For non- intra coded blocks, the average DC 
and AC energy obtained from intra coded blocks are 
substituted. 

For further speed-up, the block-based (ai^^pij) pair 
can be replaced by the average (a,p) over the whole image 
15 or over certain regions. In the following, a 
multi -region approach is described. 

The input image can be separated into many 
rectangular regions. As illustrated by Fig. 5, for each 
region an (a,|3) pair is calculated, arid the mask is 
2 0 generated accordingly. Typically, the watermark is 

divided into top and bottom regions. This is suitable 
for most outdoor views with sky in the upper half of the 
frame and darker scenery in the lower half, as shown in 
Fig. 2a, for -example. Each region will have a relatively 
25 visible watermark using different (a,l5) pairs. 

To enhance the security of the watermark further, a 
randomized location shift can be applied to the watermark 
image before applying the DCT. This makes removal of the 
/ watermark more difficult for attackers who are in 
30'^ possession of the original watermark image, e.g. when a 
/ known logo is used for watermark purposes. Sub-pixel 

randomized location shifting will make it very difficult 
for the attacker to remove the watermark without leaving 
some error residue. 
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The following can be used for shifting. Two random 
numbers, for x- and y-direction, respectively, are 
generated and normalized to lie between -1.00 to 1.00. In 
the spatial domain, sub-pixel shifting is effected by 
5 bi-linear interpolation which involves only linear scaling 
and addition. In the DCT domain, a similar bi-linear 
operation can be used. 

(ii) Motion Compensation Module 

Once the DCT blocks of the watermark have been 
10 obtained, they are inserted into the DCT frames of the 

input video in one of three ways, as illustrated by Fig. 
4, Section (ii) . For I -frame or intra coded blocks in the ^ 
B- or P- frames, the DCT of the scaled watermark is added 
directly : 

15 E'ij = Eij + W'ij (7) 

where E'ij is the i,j-th resulting DCT block, Eij the 
original DCT block, and W the scaled watermark DCT 
according to Equation 6. 

For blocks with forward motion vector in P- frame,., or 
2 0 backward motion vector only in B- frame, the watermark 

added in the anchor frame has to be removed when adding 
the current watermark. The resulting DCT error residue 
is : 

E'ij = Eij - MCDCT(W'p,Vpij ) + W (8) 

2 5 where MCDCT is the motion compensation function in the DCT 
domain as described in the above -referenced paper by S.-F. 
Chang et al . W'p is the watermark DCT used in the forward 
anchor frame, and Vp^j is the forward motion vector, as 
shown in Fig. 1. 

30 For bidirectional predicted blocks in B-frame, both 

forward and backward motion compensation has to be 



wo 99/22480 




PCT/US98/22790 



averaged and subtracted when adding the current watermark: 
. = - (MCDCT(W'p,Vpi3 MCDCT { W Va.j ) ) /2 + W (9) 

where Vy and Vg are forward and backward motion vector, 

respectively, as shown in Fig. 1. 
5 For skipped blocks, which are the 0 -motion, 0-residue 

error blocks in B- and P- frames, no operations are 

necessary, as the watermark inserted in the anchor frame 

will be carried over. 

For control of the final bit rate one or more of the 
10 following features can be included: 

1. Quantize /inverse -quantize the DCT coefficients of- ; 

the watermark so that most high-frequency coefficients 

will become zero. The result is a coarser watermark, 

using fewer bits. 
15 2, Cut off high-frequency coefficients. The effect 

is similar to low-pass filtering in the pixel domain. 

There results a smoother watermark with more rounded 

edges. 

3. Motion vector selection, setting the motion 
20 vector of a micro-block in P- frame to 0 when the error 
residue from using motion compensation of this motion 
vector is larger than without its use. 

If the motion vector is used, the residual error is 
E'ij ^ - MCDCT{w'p, Vpij) + w'ij; 

25 otherwise set Vpij = 0. 

E"ij = Eij - MCDCTdp, Vpij) + w'ij 
where Ip is the DCT of anchor frame. 
If |E"ij| < lE'ijI, set Vpij = 0. 

Figs. 2a, 2b and 2c illustrate the use of the 
3 0 adaptive watermarking techniques. Fig. 2a shows the 

original watermark mask. While a binary version is shown 
here, the algorithm is capable of handling gray scale with 
any specified transparent color. Fig. 2b shows an 
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original image. Fig. 2c shows the new wat earmarked image. 

The watermarking algorithm was tested on a HP J210 
workstation, achieving a rate of 6 frames/second. Most of 
the computational effort went into the MC-DCT operations. 
5 If all possible MC-DCT blocks were precomputed, real time 
performance would be possible. This would require 12 
megabytes of memory for 3 52x2 4 0 image size. 

In accordance with an aspect of the invention, 
preferred watermarks offer robustness in that they are not 

10 easily defeated or removed by tampering. For example, if 
a watermark is inserted in MPEG video by the method 
described above, it would be necessary to recover the 
watermark mask, estimate the embedding locations by 
extensive sub-pixel block matching, and then estimate the 

15 (a, 15) factors for each watermark region. In experiments, 
there always remained noticeable traces in the tampered 
video, which can be used to reject false claims of 
ownership and to deter piracy. 

For robustness, a watermark should not be binary, but 

20 should have texture which is similar to that of the scene 
on which it is placed. This can be accomplished by 
arbitrarily choosing an I -frame from the scene, decoding 
it by inverse DCT transform to obtain pixel values, and 
masking out the watermark from the decoded video frame. 

25 When there is camera motion such as panning and 

zooming in a video sequence, an inserted watermark may be 
defeated by applying video mosaicing, i.e. by assembling a 
large image from small portions of multiple image frames. 
The watermark then can be filtered out as outlier. 

3 0 However, this technique will fail when there are actually 
moving objects in the foreground, as the watermark will be 
embedded in the moving foreground objects as well. As a 
countermeasure in accordance with a further embodiment of 
the invention, a watermark can be used which appears 

35 static relative to over-all or background motion. Such a 
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camera motion using a 2-D affine model, and then 
translating and scaling the watermark using the estimated 
camera motion. The affine model can be described as 
f ol lows * 

The motion vectors in MPEG art usually generated by block 
matching: finding a block in die reference frame so duu die 
mean square error is minimized. Although the modon vec- 
tors do not represent die true opdcal flow, it is sdll good in 
most cases to esdmate die camera parameters in sequences 
diat do not contain large dark or uniform regions. 

When die distance between die object/background and die 
camera is large, it is usually sufficient to use a 6 parameter 
affine transform to describe die global modon of die current 
frame, 

wliere (x,y) is die coordinate of a macroblock in die current 
firame, [u is die modon vector associated with that mac- 
roblock, 02 ^3 04 a, o^h the affine transform vector. 
We denote £/ for Lf.Xfm ' > ^ ^ ^1 , and A for 
[fl, 02 fl3 fl4 fljj . 



Given the modon vector for each macroblock, we find die 
global parameter using die Least Squares (LS) estimadon, 
that is to find a set of parameter i to minimize the error 
between die modon vectors estimated in (1) and die actual 
modon vectors obtained from die MPEG stream « 

where ^ ^ is the esdmated modon vector. To solve for d , 
set the first derivadve of S{d) to 0, dien we get 
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where, 

C = iix\D = lly\E.'ilx.y. 

* y X y X y 
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All suTTimations are over all valid macro-blocks whose 
motion vectors survive after the nonlinear noise reduction 
process. After the first LS estimation, motion vectors 
that have large distance from the estimated ones are 
5 filtered out before a second LS estimation. The 

estimation process is iterated several times to refine the 
accuracy. 

Dominant motion can be estimated using clustering as 
follows : 

10 For each B- or P- frame, obtain the forward motion 

vectors , 

Assign each motion vector to one of a number (e.g. 4) 
of pre-defined classes . 

Perform one round of global affine parameter 
15 estimation. 

Assign the global affine parameter to the first class 
and assign zero to all other classes. 

Iterate a number of times, e.g. 20, or until the 
residual error is stabilized: assigning each motion vector 
2 0 to the class that minimizes Euclidean distance and 

recalculating the 2-D affine parameters for each class 
using its member motion vectors. 
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CLAIMS 

1. A method for including a watermark in a digital 
image, comprising: 

obtaining digital data of a transformed 
representation of the image; 

determining a transformed representation of the 
watermark for optimized visibility of the watermark in the 
image ; and 

superposing the transformed representation of the 
watermark on the transformed representation of the image. 

2. The method in accordance with claim 1, wherein 
the transformed representation of the image is a 
compressed representation. 

3. The method in accordance with claim 1, wherein 
the transformed" representation of the image is a discrete 
cosine transformed representation. 

4. The method in accordance with claim 1, wherein 
the image is one of a sequence of video images. 

5. The method in accordance with claim 3, wherein 
the transformed representation includes motion 
compensation. 
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